Two Questions about Data-Oriented Parsing

نویسنده

  • Rens Bod
چکیده

In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results on unedited data are worse than on cleaned-up data, although still very competitive if compared to other models. As to the parsing of word strings, we show that the hardness of the problem does not so much depend on unknown words, but on previously unseen lexical categories of known words. We give a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method is of general interest since it shows that good performance can be obtained without the use of a part-ofspeech tagger. To the best of our knowledge, our method outperforms other statistical parsers tested on Penn Treebank word strings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

cm p - lg / 9 60 60 22 17 J un 1 99 6 Two Questions about Data - Oriented Parsing

In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that conta...

متن کامل

A View of Parsing

The questions before this panel presuppose a distinction between parsing and interpretation. There are two other simple and obvious distinctions that I think are necessary for a reasonable discussion of the issues. First, we must clearly distinguish between the static specification of a process and its dynamic execution. Second, we must clearly distinguish two purposes that a natural language p...

متن کامل

A U - DOP approach to modeling language acquisition

In linguistics, there is a debate between empiricists and nativists: the former believe that language is acquired from experience, the latter that there is an innate component for language. The main arguments adduced by nativists are Arguments from Poverty of Stimulus. It is claimed that children acquire certain phenomena, which they cannot learn on the basis of experience alone —and therefore,...

متن کامل

Semantic Case Analysis of Informal Requirements

Case grammars provide a natural basis for an object-oriented analysis of software requirements. Two important areas of object-oriented requirements analysis are addressed: (1) identiication of entities which should be modeled as objects in the software design; and (2) detection of inconsistencies in the requirements documents. Available heuristics to identify these entities are based on intuiti...

متن کامل

Polynomial Tree Substitution Grammars: an efficient framework for Data-Oriented Parsing

Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima’an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9606022  شماره 

صفحات  -

تاریخ انتشار 1996